Data Mining - Pendahuluan Social Network Analysis
Outline SNA :¶
- Review Graph
- Centrality Analysis
- Graph Partitioning
- Community Detection
- Discussion on Final Project
Sejarah
- 1736: Leonhard Euler - Basel, 1707-St. Petersburg, 1786
- Mampu mengungkap misteri Jembatan Konigsberg di Prussia (saat ini Kaliningrad, Russia)
Graph (Definisi)
- Suatu Graph G adalah koleksi atau pasangan dari dua himpunan V dan E dengan
- V = V(G) = himpunan verteks atau simpul atau node.
- E = E(G) = himpunan edge atau ruas atau sisi.
Graph Types
Shortest Path and Cycles
Degree/Derajat Vertex
Bipartite Graph
Graph Representation
Graph Isomorphism
Pemodelan Graph untuk Penyelesaian Masalah
Graph Applications
Graph in Social Media Analytic
In [1]:
# 1. Mendefinisikan Graph (kosong)
import networkx as nx
G = nx.Graph()
G
Out[1]:
<networkx.classes.graph.Graph at 0x27e85f9e200>
In [2]:
V = [1, 2, 7, 9, 12, 19] # Bisa juga string, misal "A" atau nama "Budi"
E = [(1,2), (2,19), (9,2), (9,1), (2,8), (8,10), (12,7),(7,2), (7,9)] # Perhatikan "8" dan "10" TIDAK ADA di V
G.add_nodes_from(V)
G.add_edges_from(E)
print('Banyak vertex = ', G.number_of_nodes())
print('Banyak Edges = ', G.number_of_edges())
Banyak vertex = 8 Banyak Edges = 9
In [3]:
# Draw Graph # https://networkx.github.io/documentation/networkx-1.10/reference/drawing.html#layout
import matplotlib.pyplot as plt
pos = nx.spring_layout(G) # Spring LayOut
nx.draw_networkx_nodes(G,pos, alpha=0.1,node_color='blue',node_size=800) # Gambar Vertex
nx.draw_networkx_edges(G,pos,width=1) # Gambar edges
nx.draw_networkx_labels(G,pos) #Gambar Label Nodes
plt.show() # Show the graph
In [4]:
# Ingat Graph Isomorphism?
pos=nx.circular_layout(G) # Circular Layout
nx.draw_networkx_nodes(G,pos) # Gambar Vertex
nx.draw_networkx_edges(G,pos,width=2) # Gambar edges
nx.draw_networkx_labels(G,pos) #Gambar Label Nodes
plt.show() # Show the graph
In [5]:
pos = nx.random_layout(G) # Random
nx.draw_networkx_nodes(G,pos, alpha=0.2,node_color='blue',node_size=600) # Gambar Vertex
nx.draw_networkx_edges(G,pos,width=2) # Gambar edges
nx.draw_networkx_labels(G,pos) #Gambar Label Nodes
plt.show() # Show the graph
In [6]:
shell = [[1,2,7,9],[12,19,8,10]]
pos = nx.shell_layout(G, shell) # Shells
nx.draw_networkx_nodes(G,pos, alpha=0.2,node_color='blue',node_size=600) # Gambar Vertex
nx.draw_networkx_edges(G,pos,width=2) # Gambar edges
nx.draw_networkx_labels(G,pos) #Gambar Label Nodes
plt.show() # Show the graph
In [7]:
pos=nx.spectral_layout(G) # Spectral
nx.draw_networkx_nodes(G,pos, alpha=0.2,node_color='blue',node_size=600) # Gambar Vertex
nx.draw_networkx_edges(G,pos,width=2) # Gambar edges
nx.draw_networkx_labels(G,pos) #Gambar Label Nodes
plt.show() # Show the graph
PyGraphVis (Demonstration)¶
In [8]:
from graphviz import Digraph
f = Digraph('finite_state_machine', filename='fsm.gv')
f.attr(rankdir='LR', size='8,5')
f.attr('node', shape='doublecircle')
f.node('LR_0'); f.node('LR_3')
f.node('LR_4');f.node('LR_8')
f.attr('node', shape='circle')
f.edge('LR_0', 'LR_2', label='SS(B)'); f.edge('LR_0', 'LR_1', label='SS(S)')
f.edge('LR_1', 'LR_3', label='S($end)'); f.edge('LR_2', 'LR_6', label='SS(b)')
f.edge('LR_2', 'LR_5', label='SS(a)'); f.edge('LR_2', 'LR_4', label='S(A)')
f.edge('LR_5', 'LR_7', label='S(b)'); f.edge('LR_5', 'LR_5', label='S(a)')
f.edge('LR_6', 'LR_6', label='S(b)'); f.edge('LR_6', 'LR_5', label='S(a)')
f.edge('LR_7', 'LR_8', label='S(b)'); f.edge('LR_7', 'LR_5', label='S(a)')
f.edge('LR_8', 'LR_6', label='S(b)'); f.edge('LR_8', 'LR_5', label='S(a)')
f
Out[8]:
In [9]:
# Hapus Vertex/Edges
try:
G.remove_node(19)
except Exception as err_:
print(err_)
try:
G.remove_edge(1, 2)
except Exception as err_:
print(err_)
pos=nx.spectral_layout(G) # Spectral
nx.draw_networkx_nodes(G,pos, alpha=0.1,node_color='blue',node_size=800) # Gambar Vertex
nx.draw_networkx_edges(G,pos,width=2, alpha=0.1) # Gambar edges
nx.draw_networkx_labels(G,pos, font_size=14) #Gambar Label Nodes
plt.show() # Show the graph
In [10]:
# Degree
G.degree[10]
Out[10]:
1
(Shortest) Path
In [11]:
print(nx.shortest_path(G, source=9, target=10))
pos=nx.spectral_layout(G)
nx.draw_networkx_nodes(G,pos, alpha=0.1,node_color='blue',node_size=800) # Gambar Vertex
nx.draw_networkx_edges(G,pos,width=2, alpha=0.1) # Gambar edges
nx.draw_networkx_labels(G,pos, font_size=14) #Gambar Label Nodes
plt.show() # Show the graph
[9, 2, 8, 10]
Graph From Social media
Mentions, Followers, Friends
In [12]:
# Loading Data
import pandas as pd, re, operator, numpy as np
from tqdm import tqdm
file_ = 'data/sample-twitter-data.csv'
try:
import google.colab; IN_COLAB = True
!mkdir data
!wget -P data/ https://raw.githubusercontent.com/taudataanalytics/Data-Mining--Penambangan-Data--Ganjil-2024/master/{file_}
except:
IN_COLAB = False
Tweets = pd.read_csv(file_)
Tweets.head()
Out[12]:
| id_str | username | screen_name | created_at | friends_count | followers_count | favourited_count | statuses_count | full_text | description | text_created_at | text_favourites_count | text_retweets_count | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.46E+18 | Aqilacute10 | Aqila_cute | Mon Nov 01 19:00:53 +0000 2021 | 188 | 745 | 1671 | 2222 | Jika kau yakin dan percaya, maka mimpimu pun a... | Lulusan al-Azhar · Qari' · Pecinta Quran\n\nDo... | Fri Jul 08 02:02:23 +0000 2022 | 5 | 2 | 1 |
| 1 | 1.50E+18 | Kemilau16104667 | Kemilau | Tue Feb 22 23:11:01 +0000 2022 | 343 | 145 | 2 | 630 | Pemerintah serahkan pengelolaan dua menara rus... | NaN | Sat Jul 16 00:07:32 +0000 2022 | 0 | 0 | 1 |
| 2 | 163586591 | nasehatmuslim | Baik Berkata & Bertindak | Tue Jul 06 20:06:35 +0000 2010 | 562 | 662 | 5943 | 15356 | Rasulullah Muhammad bersabda,\n\nTidak boleh m... | Pengusaha, peneliti, dosen, wakil ketua, dai, ... | Mon Jul 18 14:45:36 +0000 2022 | 1 | 0 | 1 |
| 3 | 1.43E+18 | putrisrikandi_ | srikandi_ | Mon Sep 06 08:51:51 +0000 2021 | 444 | 376 | 648 | 7204 | Pemerintah lakukan serah Terima status penggun... | 🇮🇩🇮🇩🇮🇩🇮🇩🇮🇩 | Sat Jul 16 01:19:24 +0000 2022 | 0 | 0 | 1 |
| 4 | 1.53E+18 | arisa_fadillah | Fadillah Arisa | Thu Jun 09 03:39:10 +0000 2022 | 31 | 22 | 0 | 21 | PT PLN (Persero) menerima kompensasi dari peme... | gemuk gak kurus | Fri Jul 08 01:53:34 +0000 2022 | 0 | 1 | 1 |
In [13]:
# Draw the Tweet Graph
G=nx.Graph()
for i, tweet in tqdm(Tweets.iterrows()):
if tweet.screen_name not in G.nodes():
G.add_node(tweet.screen_name)
mentionS = re.findall("@([a-zA-Z0-9]{1,15})", tweet['full_text'])
for mention in mentionS:
if "." not in mention: #skipping emails
usr = mention.replace("@",'').strip()
if usr not in G.nodes():
G.add_node(usr)
G.add_edge(tweet.screen_name, usr)
Nn=G.number_of_nodes();Ne=G.number_of_edges()
print('Finished. There are %d nodes and %d edges in the Graph.' %(Nn,Ne))
3000it [00:00, 11932.91it/s]
Finished. There are 3607 nodes and 2569 edges in the Graph.
In [14]:
# Latihan dengan layout Graph yang lain
fig = plt.figure()
fig.add_subplot(111)
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, alpha=0.2, node_color='blue', node_size=10)
#nx.draw_networkx_labels(G, pos)
nx.draw_networkx_edges(G, pos, width=1)
plt.show()
I. Centrality Analysis
Bertujuan untuk menemukan pengguna yang paling berpengaruh dalam suatu topik pembicaraan di media sosial. Analisanya biasanya dilakukan melalui data graph dari hubungan jaringan pertemanan (follower/friend) antar pengguna atau komunikasi antar pengguna (mentions).
In [15]:
# https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.degree_centrality.html
N = 10
ranking = nx.degree_centrality(G)
important_nodes = sorted(ranking.items(), key=operator.itemgetter(1))[::-1]#[0:Nimportant]
Mstd = 1 # 1 standard Deviation CI
data = np.array([n[1] for n in important_nodes])
out = len(data[abs(data - np.mean(data)) > Mstd * np.std(data)]) # outlier within m stDev interval
if out>N:
dnodes = [n[0] for n in important_nodes[:N]]
print('Influencial Users: {0}'.format(str(dnodes)))
else:
dnodes = [n[0] for n in important_nodes[:out]]
print('Influencial Users: {0}'.format(str(important_nodes[:out])))
Gt = G.subgraph(dnodes)
Influencial Users: ['brin', 'Gerindra', 'prabowo', 'Hasbil', 'DivHumas', 'ListyoSigitP', 'gerindra', 'Psadt', 'al', 'ekowboy2']
In [16]:
fig = plt.figure()
fig.add_subplot(111)
pos = nx.circular_layout(Gt)
nx.draw_networkx_nodes(Gt, pos, alpha=0.2, node_color='blue', node_size=600)
nx.draw_networkx_labels(Gt, pos)
nx.draw_networkx_edges(Gt, pos, width=1)
plt.show()
Closeness Centrality
In [17]:
# https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.closeness_centrality.html
N = 10
ranking = nx.closeness_centrality(G)
important_nodes = sorted(ranking.items(), key=operator.itemgetter(1))[::-1]#[0:Nimportant]
Mstd = 1 # 1 standard Deviation CI
data = np.array([n[1] for n in important_nodes])
out = len(data[abs(data - np.mean(data)) > Mstd * np.std(data)]) # outlier within m stDev interval
if out>N:
dnodes = [n[0] for n in important_nodes[:N]]
print('Influencial Users: {0}'.format(str(dnodes)))
else:
dnodes = [n[0] for n in important_nodes[:out]]
print('Influencial Users: {0}'.format(str(important_nodes[:out])))
Gt = G.subgraph(dnodes)
fig = plt.figure()
fig.add_subplot(111)
pos = nx.circular_layout(Gt)
nx.draw_networkx_nodes(Gt, pos, alpha=0.2, node_color='blue', node_size=600)
nx.draw_networkx_labels(Gt, pos)
nx.draw_networkx_edges(Gt, pos, width=1)
plt.show()
Influencial Users: ['brin', 'blegedezLU', '🐣🐔🐓 VAXXED 3rd SHOT!', 'Pak Afiq', 'WeePHa Porever', 'KaderMilitan', "Maulin Ni'am", 'Supriyadi', '#PJMenangOmdo', 'Lantjar Djaya']
C:\anaconda\envs\Teaching\lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128035 (\N{HATCHING CHICK}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
C:\anaconda\envs\Teaching\lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128020 (\N{CHICKEN}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
C:\anaconda\envs\Teaching\lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128019 (\N{ROOSTER}) missing from font(s) DejaVu Sans.
fig.canvas.print_figure(bytes_io, **kw)
Betweenness Centrality
In [18]:
# https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html
N = 10
ranking = nx.betweenness_centrality(G)
important_nodes = sorted(ranking.items(), key=operator.itemgetter(1))[::-1]#[0:Nimportant]
Mstd = 1 # 1 standard Deviation CI
data = np.array([n[1] for n in important_nodes])
out = len(data[abs(data - np.mean(data)) > Mstd * np.std(data)]) # outlier within m stDev interval
if out>N:
dnodes = [n[0] for n in important_nodes[:N]]
print('Influencial Users: {0}'.format(str(dnodes)))
else:
dnodes = [n[0] for n in important_nodes[:out]]
print('Influencial Users: {0}'.format(str(important_nodes[:out])))
Gt = G.subgraph(dnodes)
fig = plt.figure()
fig.add_subplot(111)
pos = nx.circular_layout(Gt)
nx.draw_networkx_nodes(Gt, pos, alpha=0.2, node_color='blue', node_size=600)
nx.draw_networkx_labels(Gt, pos)
nx.draw_networkx_edges(Gt, pos, width=1)
plt.show()
Influencial Users: ['brin', 'dara', 'msaid', 'ListyoSigitP', 'Gerindra', 'prabowo', 'Istiqomah', 'DivHumas', 'B', 'Jonidi']
In [19]:
N = 10
phi = 1.618033988749895 # largest eigenvalue of adj matrix
ranking = nx.katz_centrality_numpy(G,1/phi)
important_nodes = sorted(ranking.items(), key=operator.itemgetter(1))[::-1]#[0:Nimportant]
Mstd = 1 # 1 standard Deviation CI
data = np.array([n[1] for n in important_nodes])
out = len(data[abs(data - np.mean(data)) > Mstd * np.std(data)]) # outlier within m stDev interval
if out>N:
dnodes = [n[0] for n in important_nodes[:N]]
print('Influencial Users: {0}'.format(str(dnodes)))
else:
dnodes = [n[0] for n in important_nodes[:out]]
print('Influencial Users: {0}'.format(str(important_nodes[:out])))
Gt = G.subgraph(dnodes)
fig = plt.figure()
fig.add_subplot(111)
pos = nx.circular_layout(Gt)
nx.draw_networkx_nodes(Gt, pos, alpha=0.2, node_color='blue', node_size=600)
nx.draw_networkx_labels(Gt, pos)
nx.draw_networkx_edges(Gt, pos, width=1)
plt.show()
Influencial Users: ['PresidenKopi', 'keuangannews', 'El', 'penegak kejujuran', 'Kemenparekraf', 'Komisaris Politik', 'PutraErlangga95', 'LaNyalla Academia (LNM) #hapuspt20persen', 'wishnutama', 'visiau']
In [20]:
# Menggunakan centrality measure untuk merubah tampilan Graph
## Contoh dengan graph generators https://networkx.github.io/documentation/networkx-1.10/reference/generators.html
#g = nx.dorogovtsev_goltsev_mendes_graph(3)
g = nx.karate_club_graph()
pos = nx.spring_layout(g) # Spring LayOut
nx.draw_networkx_nodes(g,pos, alpha=0.2,node_color='blue',node_size=600) # Gambar Vertex
nx.draw_networkx_edges(g,pos,width=2,alpha=0.1) # Gambar edges
nx.draw_networkx_labels(g,pos) #Gambar Label Nodes
plt.show() # Show the graph
In [21]:
# Menggunakan Centrality measure (misal degree) untuk merubah ukuran node
K = 100 # Scale factor
d = nx.degree(g)
d = [d[node]*K for node in g.nodes()]
pos = nx.spring_layout(g) # Spring LayOut
nx.draw_networkx_nodes(g,pos,node_size=d) # Gambar Vertex
nx.draw_networkx_edges(g,pos,width=2,alpha=0.15) # Gambar edges
nx.draw_networkx_labels(g,pos) #Gambar Label Nodes
plt.show() # Show the graph
In [22]:
# Menggunakan tingkat "kepentingan" sebagai warna
ranking = nx.degree_centrality(g)
warna = list(ranking.values())
print(warna)
[0.48484848484848486, 0.2727272727272727, 0.30303030303030304, 0.18181818181818182, 0.09090909090909091, 0.12121212121212122, 0.12121212121212122, 0.12121212121212122, 0.15151515151515152, 0.06060606060606061, 0.09090909090909091, 0.030303030303030304, 0.06060606060606061, 0.15151515151515152, 0.06060606060606061, 0.06060606060606061, 0.06060606060606061, 0.06060606060606061, 0.06060606060606061, 0.09090909090909091, 0.06060606060606061, 0.06060606060606061, 0.06060606060606061, 0.15151515151515152, 0.09090909090909091, 0.09090909090909091, 0.06060606060606061, 0.12121212121212122, 0.09090909090909091, 0.12121212121212122, 0.12121212121212122, 0.18181818181818182, 0.36363636363636365, 0.5151515151515151]
In [23]:
pos = nx.spring_layout(g) # Spring LayOut
nx.draw_networkx_nodes(g,pos, node_color=warna,node_size=d) # Gambar Vertex
nx.draw_networkx_edges(g,pos,width=2,alpha=0.1) # Gambar edges
nx.draw_networkx_labels(g,pos) #Gambar Label Nodes
plt.show() # Show the graph
II. Community Detection (CD)
CD dilakukan pada data jaringan media sosial untuk menemukan komunitas-komunitas dalam pertemanan atau pembicaraan di media sosial. Secara sederhana CD dapat dimengerti sebagai proses "semacam clustering" (pengelompokan) , namun atas suatu graph.
Bipartition (Bisection) Partitioning¶
- This algorithm paritions a network into two sets by iteratively swapping pairs of nodes to reduce the edge cut between the two sets.
- https://www.youtube.com/watch?v=MMlf66PQdN8
- Paper: Kernighan, B. W.; Lin, Shen (1970). “An efficient heuristic procedure for partitioning graphs.” Bell Systems Technical Journal 49: 291–307. Oxford University Press 2011.
In [24]:
B = nx.algorithms.community.kernighan_lin_bisection(g)
B
Out[24]:
({8, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33},
{0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 16, 17, 19, 21})
In [25]:
warna = []
for v in B[0]:
warna.append(1)
for v in B[1]:
warna.append(2)
In [26]:
pos = nx.shell_layout(g, B)
nx.draw_networkx_nodes(g,pos, node_color=warna,node_size=d) # Gambar Vertex
nx.draw_networkx_edges(g,pos,width=2,alpha=0.1) # Gambar edges
nx.draw_networkx_labels(g,pos) #Gambar Label Nodes
plt.show() # Show the graph
Graph Clustering via "Modularity"¶
- Modules biasa disebut juga groups, clusters atau communities
- Terdapat berbagai cara dalam menghitung "Modularity" (contoh dibawah)
- Graph with high modularity have dense connections between the nodes within "modules" but sparse connections between nodes in different modules.
- Salah satu metodenya : Greedy Modularity Maximization (GMM)
- GMM begins with each node in its own community and joins the pair of communities that most increases modularity until no such pair exists.
- Clauset, A., Newman, M. E., & Moore, C. “Finding community structure in very large networks.” Physical Review E 70(6), 2004.
- Other resources for study: https://slideplayer.com/slide/7050174/
In [27]:
# WARNING!!!... Hanya bisa jika networkX versi 2.2 ke atas
M = nx.algorithms.community.greedy_modularity_communities(g)
print(M)
[frozenset({8, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33}), frozenset({1, 2, 3, 7, 9, 12, 13, 17, 21}), frozenset({0, 16, 19, 4, 5, 6, 10, 11})]
In [28]:
W = []
warna = 1
for module in M:
for node in module:
W.append(warna)
warna = warna +1
print(W)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
In [29]:
pos = nx.shell_layout(g, M)
nx.draw_networkx_nodes(g,pos, node_color=W,node_size=d) # Gambar Vertex
nx.draw_networkx_edges(g,pos,width=2,alpha=0.1) # Gambar edges
nx.draw_networkx_labels(g,pos) #Gambar Label Nodes
plt.show() # Show the graph